Video encoding

ABSTRACT

An embodiment is directed to a method for selecting a predictive macroblock partition from a plurality of candidate macroblock partitions in motion estimation and compensation in a video encoder including determining a bit rate signal for each of the candidate macroblock partitions, generating a distortion signal for each of the candidate macroblock partitions, calculating a cost for each of the candidate macroblock partitions based on respective bit rate and distortion signals to produce a plurality of costs, and determining a motion vector from the costs. The motion vector designates the predictive macroblock partition.

BACKGROUND

1. Field

The present application relates to video encoders and cost functionsemployed therein.

2. Background

Video compression involves compression of digital video data. Videocompression is used for efficient coding of video data in video fileformats and streaming video formats. Compression is a reversibleconversion of data to a format with fewer bits, usually performed sothat the data can be stored or transmitted more efficiently. If theinverse of the process, decompression, produces an exact replica of theoriginal data then the compression is lossless. Lossy compression,usually applied to image data, does not allow reproduction of an exactreplica of the original image, but it is more efficient. While losslessvideo compression is possible, in practice it is virtually never used.Standard video data rate reduction involves discarding data.

Video is basically a three-dimensional array of color pixels. Twodimensions serve as spatial (horizontal and vertical) directions of themoving pictures, and one dimension represents the time domain.

A frame is a set of all pixels that (approximately) correspond to asingle point in time. Basically, a frame is the same as a still picture.However, in interlaced video, the set of horizontal lines with evennumbers and the set with odd numbers are grouped together in fields. Theterm “picture” can refer to a frame or a field.

Video data contains spatial and temporal redundancy. Similarities canthus be encoded by merely registering differences within a frame(spatial) and/or between frames (temporal). Spatial encoding isperformed by taking advantage of the fact that the human eye is unableto distinguish small differences in color as easily as it can changes inbrightness, and so very similar areas of color can be “averaged out.”With temporal compression, only the changes from one frame to the nextare encoded because a large number of the pixels will often be the sameon a series of frames.

Video compression typically reduces this redundancy using lossycompression. Usually this is achieved by (a) image compressiontechniques to reduce spatial redundancy from frames (this is known asintraframe compression or spatial compression) and (b) motioncompensation and other techniques to reduce temporal redundancy (knownas interframe compression or temporal compression).

H.264/AVC is a video compression standard resulting from joint effortsof ISO (International Standards Organization) and ITU (InternationalTelecommunication Union.) FIG. 1 shows a block diagram for an H.264/AVCencoder. An input video frame 102 is divided into macroblocks 104 andfed into system 100. For each macroblock 104, a predictor 132 isgenerated and subtracted from the original macroblock 104 to generate aresidual 107. This residual 107 is then transformed 108 and quantized110. The quantized macroblock is then entropy coded 112 to generate acompressed bitstream 113. The quantized macroblock is alsoinverse-quantized 114, inverse-transformed 116 and added back to thepredictor by adder 118. The reconstructed macroblock is filtered on themacroblock edges with a deblocking filter 120 and then stored in memory122.

Quantization, in principle, involves reducing the dynamic range of thesignal. This impacts the number of bits (rate) generated by entropycoding. This also introduces loss in the residual, which causes theoriginal and reconstructed macroblock to differ. This loss is normallyreferred to as quantization error (distortion). The strength ofquantization is determined by a quantization factor parameter. Thehigher the quantization parameter, the higher the distortion and lowerthe rate.

As discussed above, the predictor can be of two types—intra 128 andinter 130. Spatial estimation 124 looks at the neighboring macroblocksin a frame to generate the intra predictor 128 from among multiplechoices. Motion estimation 126 looks at the previous/future frames togenerate the inter predictor 130 from among multiple choices. Interpredictor aims to reduce temporal redundancy. Typically, reducingtemporal redundancy has the biggest impact on reducing rate.

Motion estimation may be one of the most computationally expensiveblocks in the encoder because of the huge number of potential predictorsit has to choose from. Practically, motion estimation involves searchingfor the inter predictor in a search area comprising a subset of theprevious frames. Potential predictors or candidates from the search areaare examined on the basis of a cost function or metric. Once the metricis calculated for all the candidates in the search area, the candidatethat minimizes the metric is chosen as the inter predictor. Hence, themain factors affecting motion estimation are: search area size, searchmethodology, and cost function.

Focusing particularly on cost function, a cost function essentiallyquantifies the redundancy between the original block of the currentframe and a candidate block of the search area. The redundancy shouldideally be quantified in terms of accurate rate and distortion.

The cost function employed in current motion estimators isSum-of-Absolute-Difference (SAD). FIG. 2 shows how SAD is calculated.Frame(t) 206 is the current frame containing a macroblock 208 which isstored in Encode MB (MACROBLOCK) RAM 212. Frame(t-1) 202 is the previousframe containing a search area 204 which is stored in Search RAM 210. Itis appreciated that more than one previous frame can be used.

In the example in FIG. 2, the search area 204 size is M×N. Let the sizeof the blocks being considered be A×B, where A and B are defined inTable 1. Let the given block 208 from the current frame 206 be denotedas c 215. Let each candidate from the search area 204 be denoted asp(x,y) 214, where x ε [0,N] and y ε [0,M]. (x,y) represents a positionin the search area 214. TABLE 1 Notations for e(x, y) for differentblock shapes A B${z\varepsilon}\lbrack {0,{\frac{A \times B}{16} - 1}} \rbrack$Notation  4  4 zε[0, 0] e(x, y) = [e(x, y, 0)]  8  4 zε[0, 1] e(x, y) =[e(x, y, 0) e(x, y, 1)]  4  8 zε[0, 1]${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} \\{e( {x,y,1} )}\end{bmatrix}$  8  8 zε[0, 3] ${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} & {e( {x,y,1} )} \\{e( {x,y,2} )} & {e( {x,y,3} )}\end{bmatrix}$ 16  8 zε[0, 7] ${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} & {e( {x,y,1} )} & {e( {x,y,2} )} & {e( {x,y,3} )} \\{e( {x,y,4} )} & {e( {x,y,5} )} & {e( {x,y,6} )} & {e( {x,y,7} )}\end{bmatrix}$  8 16 zε[0, 7] ${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} & {e( {x,y,1} )} \\{e( {x,y,2} )} & {e( {x,y,3} )} \\{e( {x,y,4} )} & {e( {x,y,5} )} \\{e( {x,y,6} )} & {e( {x,y,7} )}\end{bmatrix}$ 16 16 zε[0, 15]${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} & {e( {x,y,1} )} & {e( {x,y,2} )} & {e( {x,y,3} )} \\{e( {x,y,4} )} & {e( {x,y,5} )} & {e( {x,y,6} )} & {e( {x,y,7} )} \\{e( {x,y,8} )} & {e( {x,y,9} )} & {e( {x,y,10} )} & {e( {x,y,11} )} \\{e( {x,y,12} )} & {e( {x,y,13} )} & {e( {x,y,14} )} & {e( {x,y,15} )}\end{bmatrix}$

The following steps are calculated to get a motion vector (X,Y):

-   c is motion compensated 216 for by p(x,y) 214 to get a residual    error signal 218, e(x,y)    e(x,y)=p(x,y)−c   (1)-   SAD 222 is then calculated 220 from e(x,y). $\begin{matrix}    {{{SAD}( {x,y} )} = {{\sum\limits_{i,j}{{{e( {x,y} )}}\quad{where}\quad i}} \in {\lbrack {0,A} \rbrack\quad{and}\quad j} \in \lbrack {0,B} \rbrack}} & (2)    \end{matrix}$-   The motion vector (X,Y) is then calculated from SAD(x,y).    (X,Y)=(x,y)|min SAD(x,y)   (3)

Ideally, the predictor macroblock partition should be the macroblockpartition that most closely resembles the macroblock. One of thedrawbacks of SAD is that it does not specifically and accurately accountfor Rate and Distortion. Hence the redundancy is not quantifiedaccurately, and therefore it is possible that the predictive macroblockpartition chosen is not the most efficient choice. Thus, in some casesutilizing a SAD approach may actually result in less than optimalperformance.

SUMMARY

One embodiment relates to a method for selecting a predictive macroblockpartition in motion estimation and compensation in a video encoderincluding determining a bit rate signal, generating a distortion signal,calculating a cost based on the bit rate signal and the distortionsignal, and determining a motion vector from the cost. The motion vectordesignates the predictive macroblock partition. The method may beimplemented in a mobile device such as a mobile phone, digital organizeror lap top computer.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a block diagram of an H.264/AVC video encoder.

FIG. 2 shows a block diagram of the sum of absolute difference costfunction as employed in a standard video encoder.

FIG. 3 shows a block diagram of the theoretically optimal solutionRate-Distortion-optimized cost function for a video encoder.

FIG. 4 shows a block diagram of a Rate-Distortion-optimized costfunction for a video encoder.

FIG. 5 shows a first graphical illustration of the performance of avideo encoder using the sum of absolute difference cost function ascompared to a video encoder using a cost function.

FIG. 6 shows a second graphical illustration of the performance of avideo encoder using the sum of absolute difference cost function ascompared to a video encoder using a cost function

DETAILED DESCRIPTION

Reference will now be made in detail to some embodiments, examples ofwhich are illustrated in the accompanying drawings. It will beunderstood that the embodiments are not intended to limit thedescription. On the contrary, the description is intended to coveralternatives, modifications and equivalents, which may be includedwithin the spirit and scope of the description as defined by the claims.Furthermore, in the detailed description, numerous specific details areset forth in order to provide a thorough understanding. However, it maybe obvious to one of ordinary skill in the art that the presentdescription may be practiced without these specific details. In otherinstances, well known methods, procedures, components, and circuits havenot been described in detail as not to unnecessarily obscure aspects ofthe present description.

Some portions of the detailed descriptions that follow are presented interms of procedures, logic blocks, processing, and other symbolicrepresentations of operations on data bits within a computer or digitalsystem memory. These descriptions and representations are means used bythose skilled in the data processing arts to effectively convey thesubstance of their work to others skilled in the art. A procedure, logicblock, process, etc., is herein, and generally, conceived to be asequence of steps or instructions leading to a desired result.

Unless specifically stated otherwise as apparent from the discussionherein, it is understood that throughout discussions of the embodiments,discussions utilizing terms such as “determining” or “outputting” or“transmitting” or “recording” or “locating” or “storing” or “displaying”or “receiving” or “recognizing” or “utilizing” or “generating” or“providing” or “accessing” or “checking” or “notifying” or “delivering”or the like, refer to the action and processes of a computer system, orsimilar electronic computing device, that manipulates and transformsdata. The data is represented as physical (electronic) quantities withinthe computer system's registers and memories and is transformed intoother data similarly represented as physical quantities within thecomputer system memories or registers or other such information storage,transmission, or display devices.

In general, embodiments of the description below subject candidatemacroblock partitions to a series of processes that approximate theprocesses the macroblock partition would undergo were it actuallyselected as the predictive macroblock partition (see generally FIG. 1).Doing so allows for accurate approximation of the rate and distortionfor each candidate macroblock partition. Embodiments then employ aLagrangian-based cost function, rather than SAD, to select the candidatemacroblock partition that best minimizes costs associated with rate anddistortion that occur during the encoding process.

An optimal solution to predictive macroblock partition selection needsto be established to understand where SAD stands and the scope of thegains possible. The optimal solution will guarantee minimum distortion(D) under a rate (R) constraint. Such a solution is found usingLagrangian-based optimization, which combines Rate and Distortion asD+λR. λ is the Lagrangian multiplier that represents the tradeoffbetween rate and distortion. FIG. 3 shows a block diagram 300 of theoptimal solution for an RD-optimized cost function. In order tocalculate rate and distortion accurately, the entire encoding processshould be carried out for each of the candidate blocks 304, as shown inFIG. 3 and described below.

The current frame 308 is motion compensated 310 for by the candidatemacroblock partitions 304 to get a residual error signal e(x,y) as shownin (1).

e(x,y) is divided into an integral number of 4×4 blocks e(x,y,z) 312where$z \in {\lfloor {0,{\frac{A \times B}{16} - 1}} \rfloor.}$

The size of e(x,y) is A×B. The values that A and B can take are shown inTable 1. Let e(x,y,z) be denoted by E. TABLE 1 Notations for e(x, y) fordifferent block shapes A B${z\varepsilon}\lbrack {0,{\frac{A \times B}{16} - 1}} \rbrack$Notation  4  4 zε[0, 0] e(x, y) = [e(x, y, 0)]  8  4 zε[0, 1] e(x, y) =[e(x, y, 0) e(x, y, 1)]  4  8 zε[0, 1]${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} \\{e( {x,y,1} )}\end{bmatrix}$  8  8 zε[0, 3] ${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} & {e( {x,y,1} )} \\{e( {x,y,2} )} & {e( {x,y,3} )}\end{bmatrix}$ 16  8 zε[0, 7] ${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} & {e( {x,y,1} )} & {e( {x,y,2} )} & {e( {x,y,3} )} \\{e( {x,y,4} )} & {e( {x,y,5} )} & {e( {x,y,6} )} & {e( {x,y,7} )}\end{bmatrix}$  8 16 zε[0, 7] ${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} & {e( {x,y,1} )} \\{e( {x,y,2} )} & {e( {x,y,3} )} \\{e( {x,y,4} )} & {e( {x,y,5} )} \\{e( {x,y,6} )} & {e( {x,y,7} )}\end{bmatrix}$ 16 16 zε[0, 15]${e( {x,y} )} = \begin{bmatrix}{e( {x,y,0} )} & {e( {x,y,1} )} & {e( {x,y,2} )} & {e( {x,y,3} )} \\{e( {x,y,4} )} & {e( {x,y,5} )} & {e( {x,y,6} )} & {e( {x,y,7} )} \\{e( {x,y,8} )} & {e( {x,y,9} )} & {e( {x,y,10} )} & {e( {x,y,11} )} \\{e( {x,y,12} )} & {e( {x,y,13} )} & {e( {x,y,14} )} & {e( {x,y,15} )}\end{bmatrix}$

E 312 is transformed 314 into the frequency domain from the spatialdomain. Let the transformed block be denoted as t(x,y,z) or T 316. Sincethe transform is separable, it is applied in two stages, horizontal (4)and vertical (5) on E 312. E′ represents the intermediate output. Drepresents the transform matrix shown in (6). $\begin{matrix}{{{E^{\prime}( {i,j} )} = {\sum\limits_{k = 0}^{3}{{E( {i,k} )} \times {D( {k,j} )}\quad{where}\quad i}}},{j \in \lbrack {0,3} \rbrack}} & (4) \\{{{T( {i,j} )} = {\sum\limits_{k = 0}^{3}{{E^{\prime}( {k,j} )} \times {D( {i,k} )}\quad{where}\quad i}}},{j \in \lbrack {0,3} \rbrack}} & (5) \\{D = \lfloor \begin{matrix}1 & 1 & 1 & 1 \\2 & 1 & {- 1} & {- 2} \\1 & {- 1} & {- 1} & 1 \\1 & {- 2} & 2 & {- 1}\end{matrix} \rfloor} & (6)\end{matrix}$

T 316 is quantized 318 with a quantization parameter Q, which ispredetermined. Let the quantization block be denoted by l(x,y,z) or L320. $\begin{matrix}{{{{L( {i,j} )} = ( {{{T( {i,j} )} \times {M( {i,j} )}} + R} )}\operatorname{>>}{S\quad{where}\quad i}},{j \in \lbrack {0,3} \rbrack}} & (7) \\\lfloor \begin{matrix}f & g & f & g \\g & h & g & h \\f & g & f & g \\g & h & g & h\end{matrix} \rfloor & (8) \\{S = {15 + \frac{Q}{6}}} & (9) \\{R = \frac{2^{s}}{3}} & (10)\end{matrix}$

The values for the elements of M are derived from a table known in theart. A sample of the table is shown in Table 2. TABLE 2 Values forMultiplication Factor (M) for H.264 Quantization Q %6 f g h 0 13107 80665243 1 11916 7490 4660 2 10082 6554 4194 3 9362 5825 3647 4 8192 52433355 5 7282 4559 2893

Next, L 320 is entropy coded 328 using a context-adaptive variablelength coding (CAVLC) scheme. This generates the number of bits taken torepresent l(x,y,z), which is denoted as Rate(x,y,z,Q) or Rate(Q) 332.Rate(x,y,z,Q)=CAVLC(l(x,y,z,Q))   (11)It should be appreciated by one skilled in the art that CAVLC is knownin the art and that another entropy coding algorithm may be used in itsplace.

L 320 is inverse quantized 322 with quantization parameter Q. Let theinverse quantized block be denoted by {circumflex over (l)}(x,y,z) or{circumflex over (L)} 324. $\begin{matrix}{{{\hat{L}( {i,j} )} = {( {{L( {i,j} )} \times {\hat{M}( {i,j} )}} ){\operatorname{<<}S}\quad{where}\quad i}},{j \in \lbrack {0,3} \rbrack}} & (12) \\{\hat{M} = \lfloor \begin{matrix}\hat{f} & \hat{g} & \hat{f} & \hat{g} \\\hat{g} & \hat{h} & \hat{g} & \hat{h} \\\hat{f} & \hat{g} & \hat{f} & \hat{g} \\\hat{g} & \hat{h} & \hat{g} & \hat{h}\end{matrix} \rfloor} & (13)\end{matrix}$

The values for the elements of {circumflex over (M)} are derived from atable known in the art. A sample of the table is shown in Table 3. TABLE3 Values for Multiplication Factor ({circumflex over (M)}) for H.264Inverse Quantization Q %6 {circumflex over (f)} ĝ ĥ 0 10 13 16 1 11 1418 2 13 16 20 3 14 18 23 4 16 20 25 5 18 23 29

{circumflex over (L)} is transformed from the frequency domain to thespatial domain 326. Let the transformed block be denoted by ê(x,y,y,Q)or Ê 329. Since the transform is separable, it is applied in two stages,horizontal (14) and vertical (15), on {circumflex over (L)}. L′represented the intermediate output. {circumflex over (D)} representsthe transform matrix shown in (16). $\begin{matrix}{{{L^{\prime}( {i,j} )} = {\sum\limits_{k = 0}^{3}{{\hat{L}( {i,k} )} \times {\hat{D}( {k,j} )}\quad{where}\quad i}}},{j \in \lbrack {0,3} \rbrack}} & (14) \\{{{\hat{E}( {i,j} )} = {\sum\limits_{k = 0}^{3}{{L^{\prime}( {k,j} )} \times {\hat{D}( {i,k} )}\quad{where}\quad i}}},{j \in \lbrack {0,3} \rbrack}} & (15) \\{\hat{D} = \begin{bmatrix}1 & 1 & 1 & 1 \\1 & {1/2} & {{- 1}/2} & {- 1} \\1 & {- 1} & {- 1} & 1 \\{1/2} & {- 1} & 1 & {{- 1}/2}\end{bmatrix}} & (16)\end{matrix}$

The squared-error between Ê and E represents the Distortion,Distortion(x,y,z,Q) or Distortion(Q). $\begin{matrix}{{{{Distortion}\quad( {x,y,z,Q} )} = {\sum\limits_{i,j}( {{E( {i,j} )} - {E( {i,j} )}} )^{2}}}\quad{{{where}\quad i},{j \in \lbrack {0,3} \rbrack}}} & (17)\end{matrix}$

The Lagrangian cost Cost4×4(x,y,z,Q,λ) is calculated for a predefined λ.Cost4×4(x,y,z,Q,λ)=Distortion(x,y,z,Q)+λ×Rate(x,y,z,Q)   (18)

The total cost for p(x,y) is given by: $\begin{matrix}{{{{Cost}\quad( {x,y,Q,\lambda} )} = {\sum\limits_{z}{{Cost}\quad 4\quad x\quad 4( {x,y,z,Q,\lambda} )}}}{{{where}\quad z} \in \lfloor {0,{\frac{A \times B}{16} - 1}} \rfloor}} & (19)\end{matrix}$

The motion vector (X,Y) is then calculated as follows.(X,Y)=(x,y)|min Cost(x,y,Q,λ)   (20)

The optimal solution just described maybe too complex to be practicaleven though it provides the best solution possible. Embodiments of thepresent description introduce a new cost function that represents acomputational approximation of the optimal solution. This computationalapproximation may have an insignificant impact on the results of theoptimal solution while significantly reducing the complexity of thesame.

FIG. 4 shows a block diagram 400 of an embodiment. Just as with theoptimal solution, the current frame 408 is motion compensated 410 for bythe candidate macroblock partitions 406 to get the residual errorsignal, e(x,y). e(x,y) is then divided into an integral number of fourby four blocks, e(x,y,z) or E 412, as shown in Table 1. E 412 is thentransformed 414 into the frequency domain from the spatial domain asshown in (4), (5), and (6) to get t(x,y,z) or T 416.

According to the optimal solution, T would now be quantized. However,the quantization process is computationally complex because it involvesmultiplication and other complex binary functions. Thus in oneembodiment, the multiplication of T and M from (7) is approximatedthrough a series of shifts and adds as follows:M(i,j)×T(i,j)=(T(i,j)<<a+Sign(T(i,j)<<b, b )+Sign(T(i,j)<<c, c ))>>d  (21)(7) can be rewritten as the quantization approximation 418:L(i,j)=((T(i,j)<<a+Sign(T(i,j)<<b, b )+Sign(T(i,j)<<c, c ))>>d+R)>>S  (22)

-   -   where i,j ε [0,3], and where Sign(x) is 1 when x is negative and        0 when positive

S and R can be determined from (9) and (10). The multiplication factor Mis approximated with {tilde over (M)}. The values of a, b, c, d, b, andc are found in Table 4 and Table 5 for a corresponding firstquantization approximation parameter and corresponding elements of theapproximate multiplication factor {tilde over (M)}. TABLE 4 (a, b, c, d)shift values for a given value of the Multiplication ApproximationFactor ({circumflex over (M)}) for quantization approximation (23)$\overset{\sim}{M} = \begin{bmatrix}\overset{\sim}{f} & \overset{\sim}{g} & \overset{\sim}{f} & \overset{\sim}{g} \\\overset{\sim}{g} & \overset{\sim}{h} & \overset{\sim}{g} & \overset{\sim}{h} \\\overset{\sim}{f} & \overset{\sim}{g} & \overset{\sim}{f} & \overset{\sim}{g} \\\overset{\sim}{g} & \overset{\sim}{h} & \overset{\sim}{g} & \overset{\sim}{h}\end{bmatrix}$ {tilde over (f)} ĝ ĥ QP %6 a b c d a b c d a b c d 0 1113 14 1 13 0 7 0 12 10 7 0 1 13 14 10 1 14 10 9 1 12 9 0 0 2 13 11 0 012 11 9 0 12 7 0 0 3 13 10 0 0 14 13 10 2 12 0 9 0 4 13 0 0 0 14 12 9 211 10 8 0 5 13 7 10 0 12 9 0 0 11 9 8 0

TABLE 5 ( b, c) sign values for a given value of the MultiplicationApproximation Factor ({tilde over (M)}) for quantization approximation{tilde over (f)} ĝ ĥ QP %6 b c b c b c 0 0 0 0 1 0 0 1 0 1 1 1 0 0 2 0 00 0 0 0 3 0 0 0 1 0 1 4 0 0 0 0 0 0 5 0 1 0 0 0 0

According to the optimal solution, the quantization approximation block420 would then be entropy coded to produce the rate signal 428. However,entropy coding algorithms such as CAVLC are highly computationallydemanding operations. Entropy coding of a 4×4 quantized block involvesencoding a Token (indicates the number of non-zero coefficients and thenumber of trailing 1's), signs or the trailing 1's, Levels of thenon-zero coefficients, and Runs of zeros between non-zero coefficients.In one embodiment, the entropy coding is eliminated by using the FastBits Estimation Method (FBEM) to estimate the rate. According to FBEM,the number of bits taken by the different elements can be derived fromthe number of non-zero coefficients (N_(C)), the number of zeros(N_(Z)), and the sum of absolute levels (SAL). $\begin{matrix}{{{Rate}\quad( {x,y,z,Q} )} = {{Token\_ Bits} + {Sign\_ Bits} + {Level\_ Bits} + {Run\_ Bits}}} & (24) \\{\quad{{Token\_ Bits} = N_{c}}} & (25) \\{\quad{{Sign\_ Bits} = N_{c}}} & (26) \\{\quad{{Level\_ Bits} = {SAL}}} & (27) \\{\quad{{Run\_ Bits} = {N_{c} + N_{z}}}} & (28) \\{\quad{N_{c} = {\sum\limits_{i = 0}^{n}( {{{Scan}\quad( {l( {x,y,z,Q} )} )}!=0} )}}} & (29)\end{matrix}$

where Scan( ) represents the zig-zag scan $\begin{matrix}{N_{z} = {\sum\limits_{i = 0}^{n}( {{{Scan}\quad( {l( {x,y,z,Q} )} )}==0} )}} & (30)\end{matrix}$

where Scan( ) represents the zig-zag scan $\begin{matrix}{{SAL} = {\sum\limits_{i = 0}^{16}{{l( {x,y,z,Q} )}}}} & (31)\end{matrix}$Thus, a Rate 428 can be determined for each candidate macroblockpartition 406 through an entropy coding approximation 424.

According to the optimal solution, L would also need to beinverse-quantized 322 and inverse-transformed 326. Similar toquantization, inverse quantization is also computationally complex. Inone embodiment, these processes are simplified through an inversequantization approximation. The inverse quantization approximation isachieved by performing the same steps as the quantization approximation,but with a second quantization parameter.L′(i,j)=((T(i,j)<<a+Sign(T(i,j)<<b, b )+Sign(T(i,j)<<c, c ))>>d+R)>>S  (32)In one embodiment, the second quantization parameter is chosen such thatS=15, which approximates the equivalent to calculating thezero-distortion value.

By doing the above steps, inverse quantization 322 has beensignificantly simplified and inverse transformation 326 is no longernecessary. It is appreciated that because embodiments achieve theinverse quantization approximation through quantization approximationwith a second quantization parameter, both L and L′ can be generatedfrom the same circuitry, module, etc.

In one embodiment, once the inverse quantization approximation block L′422 has been generated, the Distortion 430, Distortion(x,y,z,Q) orDistortion(Q), can be represented by the squared-error between L′ and L.(L′-L) represents the quantization error and has a small dynamic range.Hence embodiments can store the squared values in a lookup-table toavoid the squaring operation. $\begin{matrix}{{{Distortion}( {x,y,z,Q} )} = {\sum\limits_{i,j}( {{L^{\prime}( {i,j} )} - {L( {i,j} )}} )^{2}}} & (33)\end{matrix}$

In one embodiment, the Lagrangian cost for each of the integral numberof four by four blocks Cost4×4(x,y,z,Q,λ) is calculated for a predefinedλ.Cost4×4(x,y,z,Q,λ)=Distortion(x,y,z,Q)+λ×Rate(x,y,z,Q)   (34)

In one embodiment, the total cost for p(x,y) is given by:$\begin{matrix}{{{Cos}\quad{t( {x,y,Q,\lambda} )}} = {{\sum\limits_{z}{{Cos}\quad t\quad 4x\quad 4( {x,y,z,Q,\lambda} )\quad{where}\quad z}} \in \lbrack {0,\frac{A \times B}{16}} \rbrack}} & (35)\end{matrix}$

Finally, the motion vector (X,Y) is then selected as follows:(X,Y)=(x,y)|min Cost(x,y,Q,λ)   (36)

Thus, the above embodiments are able to accurately approximate the rateand distortion for each candidate macroblock partition. The embodimentsmay select the best possible predictive macroblock partition with morecertainty than the SAD cost function because the selection processspecifically account for Rate and Distortion. Therefore, the embodimentsare able to achieve a higher signal to noise ratio than SAD for a givenbitrate, as illustrated in FIG. 5 and FIG. 6.

The previous description of the disclosed embodiments is provided toenable any person skilled in the art to make or use the presentdescription. Various modifications to these embodiments may be readilyapparent to those skilled in the art, and the generic principles definedherein may be applied to other embodiments without departing from thespirit or scope of the description. Thus, the present description is notintended to be limited to the embodiments shown herein but is to beaccorded the widest scope consistent with the principles and novelfeatures disclosed herein.

1. A method for selecting a predictive macroblock partition in motionestimation and compensation in a video encoder comprising: determining abit rate signal; generating a distortion signal; calculating a costbased on said bit rate signal and said distortion signal; anddetermining a motion vector from said cost, wherein said motion vectordesignates said predictive macroblock partition.
 2. The method asrecited in claim 1 wherein said determining of said bit rate signalcomprises: generating a residual error signal by subtracting amacroblock of a current frame from a section of a search area; dividingsaid residual error signal into an integral number of blocks;transforming said integral number of blocks into the frequency domainfrom the spatial domain to create a transform of said integral number ofblocks; generating a quantization approximation block through acombination of shifts and adds of said transform of said integral numberof blocks, said quantization approximation block having a number ofnon-zero coefficients and a number of zero value coefficients, whereinsaid combination of shifts and adds is based on a quantizationparameter; calculating a sum of absolute levels of said quantizationapproximation block; and calculating said bit rate signal for saidquantization block by summing three times said number of non-zerocoefficients plus said number of zero-value coefficients plus said sumof absolute levels.
 3. The method as recited in claim 2 wherein saidblocks of said integral number of blocks have dimension of four pixelsby four pixels.
 4. The method as recited in claim 1 wherein saidgenerating of said distortion signal comprises: calculating a residualerror signal by subtracting a macroblock of a current frame from asection of a search area; dividing said residual error signal into anintegral number of blocks; transforming said integral number of blocksinto the frequency domain from the spatial domain, creating a transformof said integral number of blocks; generating a quantizationapproximation block through a first combination of shifts and adds ofsaid transform of said integral number of blocks, wherein said firstcombination of shifts and adds is based on a first quantizationparameter; generating an inverse quantization approximation blockthrough a second combination of shifts and adds of said transform ofsaid integral number of blocks, wherein said second combination ofshifts and adds is based on a second quantization parameter; anddetermining a squared-error between said quantization approximationblock and said inverse quantization approximation block, wherein saiddistortion signal equals said squared-error.
 5. The method as recited inclaim 4 wherein said blocks of said integral number of blocks havedimension of four pixels by four pixels.
 6. The method as recited inclaim 4 further comprising: storing possible values of said squarederror in a lookup-table in a memory.
 7. The method as recited in claim 1wherein said calculating of said cost comprises: calculating aLagrangian cost by summing said distortion signal and said bit ratesignal multiplied by a Lagrangian multiplier; and performing a summationof said Lagrangian cost over each of said integral number of blocks,wherein said cost is equal to said summation.
 8. The method as recitedin claim 7 wherein said blocks of said integral number of blocks havedimension of four pixels by four pixels.
 9. The method as recited inclaim 1 wherein said determining of said motion vector comprises:scanning said cost for a lowest value, wherein said motion vector isdefined as a vector corresponding to said lowest value.
 10. The methodas recited in claim 1 wherein said video encoder is an H.264/AVCencoder.
 11. A method for selecting a predictive macroblock partitionfrom a plurality of candidate macroblock partitions in motion estimationand compensation in a video encoder comprising: determining a bit ratesignal for each of said candidate macroblock partitions; generating adistortion signal for each of said candidate macroblock partitions;calculating a cost for each of said candidate macroblock partitionsbased on a respective bit rate signal and respective distortion signalto produce a plurality of costs; and determining a motion vector fromsaid costs, wherein said motion vector designates said predictivemacroblock partition.
 12. The method as recited in claim 11 wherein saiddetermining of said bit rate signal comprises: per candidate macroblockpartition, generating a residual error signal by subtracting amacroblock of a current frame from respective candidate macroblockpartition; per candidate macroblock partition, dividing respectiveresidual error signal into an integral number of blocks; per candidatemacroblock partition, transforming respective integral number of blocksinto the frequency domain from the spatial domain, creating a pluralityof transforms; per transform, generating a quantization approximationblock through a combination of shifts and adds of respective transform,said quantization approximation block having a number of non-zerocoefficients and a number of zero value coefficients, wherein saidcombination of shifts and adds is based on a quantization parameter; pertransform, calculating a sum of absolute levels of said quantizationapproximation block; and per transform, calculating said bit rate signalby summing three times respective number of non-zero coefficients plusrespective number of zero-value coefficients plus respective sum ofabsolute levels.
 13. The method as recited in claim 12 wherein saidblocks of said integral number of blocks have dimension of four pixelsby four pixels.
 14. The method as recited in claim 11 wherein saidgenerating of said distortion signal comprises: per candidate macroblockpartition, generating a residual error signal by subtracting amacroblock of a current frame from respective candidate macroblockpartition; per candidate macroblock partition, dividing respectiveresidual error signal into an integral number of blocks; per candidatemacroblock partition, transforming respective integral number of blocksinto the frequency domain from the spatial domain, creating a pluralityof transforms; per transform, generating a quantization approximationblock through a first combination of shifts and adds of respectivetransform, wherein said first combination of shifts and adds is based ona first quantization parameter; per transform, generating an inversequantization approximation block through a second combination of shiftsand adds of respective transform, wherein said second combination ofshifts and adds is based on a second quantization parameter; and pertransform, determining a squared-error between respective quantizationapproximation block and respective inverse quantization approximationblock, wherein said distortion signal equals said squared-error.
 15. Themethod as recited in claim 14 wherein said blocks of said integralnumber of blocks have dimension of four pixels by four pixels.
 16. Themethod as recited in claim 14 further comprising: storing possiblevalues of said squared error in a lookup-table in a memory.
 17. Themethod as recited in claim 11 wherein each of said candidate macroblockpartitions has an integral number of blocks, wherein each of said blockshas a respective bit rate signal and a respective distortion signal, andwherein said calculating of said cost comprises: per block, calculatinga Lagrangian cost by summing respective distortion signal and respectivebit rate signal multiplied by a Lagrangian multiplier; and per candidatemacroblock partition, producing said cost by summing respectiveLagrangian costs of respective blocks.
 18. The method as recited inclaim 17 wherein said blocks have dimension of four pixels by fourpixels.
 19. The method as recited in claim 11 wherein said determiningof said motion vector comprises: scanning said costs for a lowest value,wherein said motion vector is defined as a vector corresponding torespective candidate macroblock partition of said lowest value.
 20. Themethod as recited in claim 11 wherein said video encoder is an H.264/AVCencoder.
 21. An apparatus for selecting a predictive macroblockpartition in motion estimation and compensation in a video encodercomprising: a motion compensation block to receive a macroblock of acurrent frame and an M by N search area and generate a residual errorsignal; a forward transform block coupled with said motion compensationblock to receive said residual error signal and transform said residualerror signal into a frequency-domain residual error signal; aquantization approximation block coupled with said forward transformblock to receive said frequency-domain residual error signal andgenerate an approximated quantization signal based on a firstquantization parameter and generate an approximated inverse quantizationsignal based on a second quantization parameter; an entropy codingapproximation block coupled with said quantization approximation blockto receive said approximated quantization signal and generate a ratesignal, wherein said rate signal is used in selecting said predictivemacroblock partition; and a sum of squared difference block coupled withsaid quantization approximation block to receive said approximatedquantization signal and said approximated inverse quantization signaland generate a distortion signal, wherein said distortion signal is usedin selecting said predictive macroblock partition.
 22. The apparatus asrecited in claim 21 further comprising: a cost determination blockcoupled with said entropy coding approximation block and said sum ofsquared difference block, wherein said cost determination block receivessaid rate signal and said distortion signal and generates a motionvector.
 23. The apparatus as recited in claim 21 wherein said M by Nsearch area comprises a plurality of candidate macroblock partitions andsaid motion compensation block generates said residual error signal percandidate macroblock partition by subtracting said macroblock fromrespective candidate macroblock partition.
 24. The apparatus as recitedin claim 21 wherein said M by N search area comprises a plurality ofcandidate macroblock partitions, wherein said frequency-domain residualerror signal comprises a plurality of frequency-domain residual errorsub-signals, wherein each frequency-domain residual error sub-signalcorresponds to one of said candidate macroblock partitions, wherein saidquantization approximation block generates an approximated quantizationsub-signal for each frequency-domain residual error sub-signal through acombination of shifts and adds of respective frequency-domain residualerror sub-signal based on said first quantization parameter, producing aplurality of approximated quantization sub-signals, wherein saidapproximated quantization signal comprises said plurality ofapproximated quantization sub-signals.
 25. The apparatus as recited inclaim 21 wherein said M by N search area comprises a plurality ofcandidate macroblock partitions, wherein said frequency-domain residualerror signal comprises a plurality of frequency-domain residual errorsub-signals, wherein each frequency-domain residual error sub-signalcorresponds to one of said candidate macroblock partitions, wherein saidquantization approximation block generates an approximated inversequantization sub-signal for each frequency-domain residual errorsub-signal through a combination of shifts and adds of respectivefrequency-domain residual error sub-signal based on said secondquantization parameter, producing a plurality of approximated inversequantization sub-signals, wherein said approximated inverse quantizationsignal comprises said plurality of approximated inverse quantizationsub-signals.
 26. The apparatus as recited in claim 21 wherein said M byN search area comprises a plurality of candidate macroblock partitions,wherein said approximated quantization signal comprises a plurality ofapproximated quantization sub-signals, wherein each of said approximatedquantization sub-signals corresponds to one of said candidate macroblockpartitions and has a number of non-zero coefficients and a number ofzero-value coefficients, wherein said entropy coding approximation blockgenerates a rate sub-signal for each approximated quantizationsub-signal by calculating a sum of absolute levels of respectiveapproximated quantization sub-signal and summing respective sum ofabsolute levels plus three times respective number of non-zerocoefficients plus respective number of zero-value coefficients,producing a plurality of approximated quantization sub-signals, whereinsaid approximated quantization signal comprises said plurality ofapproximated quantization sub-signals.
 27. The apparatus as recited inclaim 21 wherein said M by N search area comprises a plurality ofcandidate macroblock partitions, wherein said approximated quantizationsignal comprises a plurality of approximated quantization sub-signals,wherein each of said approximated quantization sub-signals correspondsto one of said candidate macroblock partitions, wherein saidapproximated inverse quantization signal comprises a plurality ofapproximated inverse quantization sub-signals, wherein each of saidapproximated inverse quantization sub-signals corresponds to one of saidcandidate macroblock partitions, wherein said sum of squared differenceblock generates a distortion sub-signal for each candidate macroblockpartition by determining a squared-error between said approximatedquantization sub-signal and said approximated inverse quantizationsub-signal to produce a plurality of distortion sub-signals, whereinsaid distortion signal comprises said plurality of distortionsub-signals.
 28. The apparatus as recited in claim 27 further comprisinga memory coupled with said sum of squared difference block for storingpossible values of said squared error in a lookup-table.
 29. Theapparatus as recited in claim 22 wherein said M by N search areacomprises a plurality of candidate macroblock partitions, wherein saidrate signal comprises a plurality of rate sub-signals, wherein each ofsaid rate sub-signals corresponds to one of said candidate macroblockpartitions, wherein said distortion signal comprises a plurality ofdistortion sub-signals, wherein each of said distortion sub-signalscorresponds to one of said candidate macroblock partitions, wherein saidcost determination block generates said motion vector by calculating aLagrangian cost for each candidate macroblock partition by summingrespective distortion signal and respective rate signal multiplied by aLagrangian multiplier, producing a plurality of Lagrangian costs, andscanning said Lagrangian costs for a lowest value, wherein said motionvector is defined as a vector corresponding to respective candidatemacroblock partition of said lowest value.
 30. The apparatus as recitedin claim 21 wherein said video encoder is an H.264/AVC encoder.